Skip to content

Branch fs spi review#62024

Merged
morningman merged 2 commits intoapache:branch-fs-spifrom
morningman:branch-fs-spi-review
Apr 2, 2026
Merged

Branch fs spi review#62024
morningman merged 2 commits intoapache:branch-fs-spifrom
morningman:branch-fs-spi-review

Conversation

@morningman
Copy link
Copy Markdown
Contributor

No description provided.

morningman and others added 2 commits April 2, 2026 06:59
…odule

### What problem does this PR solve?

Issue Number: N/A

Problem Summary: Code review of fe/fe-filesystem/ identified 2 critical bugs,
5 major issues, and 5 minor issues. This commit fixes all of them:

**Critical bugs fixed:**
- ObjFileSystem.exists(): called location.withoutScheme() which strips the URI
  scheme, causing S3Uri/AzureUri parsing to throw on every exists() call.
  Fixed to use location.uri() (full URI with scheme).
- S3FileSystem.globListWithLimit(): pagination loop exited prematurely after
  hitting the limit, so nextMatchAfterLimit was only searched in the remaining
  items of the current S3 page. Fixed to continue paginating until the boundary
  key is found.

**Major issues fixed:**
- RemoteObject.modificationTime() renamed to getModificationTime() for JavaBean
  consistency with the other getters (getKey, getSize, getEtag, getRelativePath).
- S3ObjStorage.buildClient(): removed hard requirement for PROP_ENDPOINT;
  when absent the AWS SDK uses region-only routing for native AWS S3 access.
- BrokerInputStream FD leak: added clientInvalidated flag; close() now skips
  the closeReader RPC when the client was already invalidated by a prior RPC
  failure, preventing a double-invalidate and broker-side FD leak.
- DFSFileSystem.list(): replaced eager listStatus(path) (returns full FileStatus[])
  with lazy listStatusIterator(path) backed by HdfsFileIterator; each hasNext()
  and next() call runs inside the HadoopAuthenticator context for Kerberos safety.
- COS/OBS/OSS providers: added explicit _STORAGE_TYPE_ check so VPC/custom-domain
  endpoints are correctly routed without relying on domain-name pattern matching.

**Minor issues fixed:**
- RemoteIterator, SimpleRemoteIterator, FileSystemIOException removed from SPI
  (unused; FileIterator already covers the same purpose).
- RequestBody.content() no longer declares throws IOException (no IO occurs).
- LocalSeekableInputStream.read() and seek() now throw IOException when closed.
- AzureFileSystem.rename() throws IOException instead of UnsupportedOperationException
  for directory renames, matching the FileSystem contract.
- BrokerFileSystemProvider.create(): null check added for BROKER_PORT before
  Integer.parseInt to give a clear error message instead of NPE.

### Release note

Fix ObjFileSystem.exists() always failing due to withoutScheme(); fix S3 glob
pagination losing results after limit; support AWS region-only S3 access without
an explicit endpoint; fix broker reader FD leak on RPC failure; HDFS listing is
now lazy/streaming.

### Check List (For Author)

- Test: Manual compile verification (all 10 fe-filesystem modules BUILD SUCCESS)
- Behavior changed: Yes — S3 clients no longer require PROP_ENDPOINT for native
  AWS S3; ObjFileSystem.exists() now works correctly; HDFS list is lazy
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?

Issue Number: N/A

Problem Summary:
fe-filesystem-spi mixed consumer-facing API types (FileSystem, Location,
FileEntry, etc.) with provider-facing SPI contracts (ObjStorage,
HadoopAuthenticator, etc.) in the same module and package
(org.apache.doris.filesystem.spi). This made it impossible to depend on
just the consumer API without pulling in all provider-implementation
contracts.

### Changes

Created a new fe-filesystem-api module with package org.apache.doris.filesystem,
containing the 16 consumer-facing types:
- FileSystem, FileSystemProvider (plugin discovery interface)
- Location, FileEntry, BlockInfo (value types)
- FileIterator, DorisInputFile, DorisInputStream, DorisOutputFile (I/O types)
- GlobListing, FileSystemType, FileSystemUtil, FileSystemTransferUtil (utilities)
- FileSystemIOException, RemoteIterator, SimpleRemoteIterator (exceptions/iteration)

Retained in fe-filesystem-spi (package org.apache.doris.filesystem.spi),
the 9 provider-only contracts:
- ObjFileSystem, ObjStorage, HadoopAuthenticator, IOCallable
- RemoteObject, RemoteObjects, RequestBody, StsCredentials, UploadPartResult

All 8 impl modules (S3/Azure/HDFS/Broker/COS/OBS/OSS/Local) and fe-core
have their imports updated to the new package.

META-INF/services files renamed from org.apache.doris.filesystem.spi.FileSystemProvider
to org.apache.doris.filesystem.FileSystemProvider to match the moved interface.

build.sh updated to include fe-filesystem-api in the build module list and
exclude it from plugin dependency copy (it is on the main FE classpath).

### Release note

None

### Check List (For Author)

- Test: Build verified with ./build.sh --fe -j4 → Successfully build Doris
- Behavior changed: No (API-compatible refactor, same public contracts)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morningman morningman merged commit d60105b into apache:branch-fs-spi Apr 2, 2026
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants